class: title-slide <div class="title-card"> <h1> Identifying Screening-Relevant Context in OSA Study Recruitment Using Clinical Note Metadata and LLM-Extracted Signals </h1> <p class="small-title-meta"> Ashley Batugo · BMIN503/EPID600 · December 4, 2025 </p> </div> --- # Problem & Motivation .pull-left[ ### Clinical Reality - Recruitment is a major bottleneck - Key information lives in free-text EHR notes - Chart review is time-intensive - CRC decisions rely on: - Formal exclusions - Informal skip signals *Motivated by recruitment challenges in an OSA clinical study I'm part of* ] .pull-right[ ### Informatics Opportunity - Use LLMs to extract exclusion context from notes - Use metadata to identify high-yield contexts - Reduce unnecessary chart review - Support CRC workflows ] --- class: section-slide, middle, center <div class="section-card"> <h1>Central Project Question</h1> <p class="section-subtitle"> Which note-level metadata are most associated with screening exclusions in a sleep study? </p> </div> --- # Study Cohort / Data Scope .pull-left[ Study Cohort - Source: **CRC-maintained Excel screening report** - Population: patients already excluded by CRC Exclusion categories were defined by reviewing: - CRC notes for exclusion (in the excel report) Final analytic exclusion buckets: - Clinical contraindications - Procedural / recent events - Sleep-specific exclusions ] .pull-right[ Unit of analysis: **clinical notes** - Clinic notes (≤ 1 year before pre-screening) - Surgical notes (all time) - Pathology reports (all time) ] --- class: llm-slide # Phase 1 — LLM-Based Evidence Extraction .panelset[ .panel[.panel-name[Objective] **LLM Task Main Task: For each note, decide whether it contains current exclusion-related information.** ] .panel[.panel-name[Sample Input] Input: De-identified notes (excl. sensitive notes) ```text [TIME_RELATIVE_TO_PRESCREEN: 0-30d | DELTA_DAYS = 12] NAME reports worsening daytime fatigue and loud snoring. History of recent surgery for nasal obstruction. ``` ] .panel[.panel-name[Prompt Snapshot] .pull-left[ ```text Role: OSA screening assistant Input: one note + time header Task: Is there CURRENT evidence of: 1) Clinical contraindication 2) Procedural / recent event 3) Sleep-specific exclusion Rules: - Use explicit text only - Respect temporal context - Do not infer missing information Output: JSON with labels, rationale, confidence ``` ] .pull-right[ Returned per note (in JSON): - 3 exclusion flags for presence (1) or absence (0) clinical contraindications, recent procedural events, and sleep-specific exclusions - Rationale - Confidence scores ] ] .panel[.panel-name[Code Snapshot] .pull-left[ ~~~python config = LLMClassificationConfig.for_inference( text_column="note_text", target_labels=[ "clinical_contra", "procedural_recent", "sleep_specific" ] ) predictor = LLMClassificationPredictor( config=config, client=client, system_prompt=system_prompt, task_prompt=prompt_text, endpoint="openai-gpt-4o-mini-chat", temperature=0.0 ) results = predictor.predict_batch(all_notes_final) ~~~ ] .pull-right[ - Sent each note to GPT-4o mini (programatically) - Executed in HIPAA-compliant Databricks. ] ] .panel[.panel-name[Validation] - **Patient-level coverage** - For each exclusion bucket, checked whether ≥ 1 note flagged the patient correctly - **Manual review** - 5 patients per category - Confirmed LLM rationales matched the note text - **Prompt frozen** after: - ≥ 80% patient-level coverage in each bucket ] ] --- # Prompt Performance Of the patients excluded within each bucket by the CRC, how many of those patients were identified by the LLM? .pull-left[ - Clinical contraindications - Coverage: **98.7 percent** - Procedural / recent care - Coverage: **94.6 percent** - Sleep-specific - Coverage: **95.2 percent** ] .pull-right[ > All categories exceeded the > 80% threshold > Prompt frozen and applied to full cohort. ] --- # Phase 2 — Modeling & Interpretation .panelset[ .panel[.panel-name[Objective] Final Result (Phase 1) : JSON → converted to dataframe → table in Databricks → joined with metadata → modeling dataset Translate LLM outputs and note metadata into: - Interpretable statistical models - Quantified screening signals - Goal: Identify which notes matter most and when they matter. ] .panel[.panel-name[Outcomes + Predictors] .pull-left[ Three outcomes — each studied separately: 1. **Overall Exclusion** Where does *any* exclusion information tend to appear? 2. **Clinical Contraindications** Where are medical exclusions usually documented? 3. **Procedural / Sleep-Related** Where do procedural and sleep-related exclusions appear? ] .pull-right[ Structured metadata inputs: - Note type - Clinical specialty - Encounter type - Time window before screening All predictors were converted to categorical factors. ] ] .panel[.panel-name[Design Decisions] .pull-left[ Feature preparation: - Collapsed spare groups - Missing values → `"Unknown"` - Harmonized the inpatient and outpaient note type fields ] .pull-right[ Imbalance handling: - Inverse class weighting for: - Clinical model - Other model ] ] .panel[.panel-name[Modeling] .pull-left[ Modeling approach: - Multivariable Logistic regression (`glm()`) - Odds ratios for interpretation - Fit on full dataset to maximize statistical power for signal discovery (bc this is inferential modeling) ] .pull-right[ Validation: - Bootstrap resampling - Checked: (1) Directional stability; (2) Signal robustness Model Function: - ```glm(outcome ~ Note Type + Clinical specialty + Encounter Type + Time from Pre-Screening)``` ] ] ] --- # Results: Cohort & Notes Landscape .panelset[ .panel[.panel-name[Cohort Summary] ### Study Cohort - **2,911** de-identified notes - **164** screened patients - Median notes per patient: **11** (range: 1–99) - Note date range: **Sep 2011 – Nov 2025** ] .panel[.panel-name[Documentation Shape] .pull-left[  ] .pull-right[  ] ] .panel[.panel-name[Care Context] .pull-left[  ] .pull-right[  ] ] ] --- class: results-slide # Results — Which metadata signals exclusion? .panelset[ .panel[.panel-name[Main Result] .pull-left[  ] .pull-right[ .interpretation[ **Main takeaways** - Odds > 1 = higher chance of exclusion - Bars = uncertainty in effect size - **Encounter type + recency dominate** - Specialty has mixed strength - Note type is consistently weak **Across models** - *Clinical*: encounter + specialty; recency less important - *Other*: encounter + timing; specialty weaker ] ] ] .panel[.panel-name[Stability through Boostrapping] .pull-left[ ] .pull-right[ .interpretation[ **What this shows** - Effects are directionally stable - Encounter type + timing = most reliable and dominant features - Note type & specialty = less stable - No predictors behave randomly **How other models compare** - **Clinical Model**: Directionally stable effects in ~1/2 of features; encounter type and specialty = main drivers - **'Other' Model**: Directionally stable effects in ~1/2 of features; Encounter Type and time window are most stable and still main drivers ] ] ] ] --- class: small-text-slide # Conclusion .pull-left[ Main Takeaways - LLMs can extract exclusion-relevant signals - Note metadata can help identify high-yield notes for review - Strongest signals: - Encounter type (most often in hospital encounters, office visits, and telemedicine visits) - Proximity to Pre-Screening (most recent notes contain relevant exclusion information) ] .pull-right[ Ethical Considerations - More transparent screening - Fewer subjective skip decisions Limitations - LLM outputs not fully adjudicated - Small cohort - Sparse specialty effects - Findings may not generalize for other recruitment workflows Future Directions - Retrospective validation of LLM outputs and regression results with CRCs - Prospective evaluation for future recruitment ] --- class: end-slide, middle # Thanks! ## (and special thanks to Danielle Mowery, Emily Schriver, and Paula Salvador for their assistance with this project!) .small[ Questions or comments? ]